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Abstract: 

Molecular networks guide the biochemistry of a living cell on multiple levels: 
its metabolic and signalling pathways are shaped by the network of inter- 
acting proteins, whose production, in turn, is controlled by the genetic regu- 
latory network. To address topological properties of these two networks we 
quantify correlations between connectivities of interacting nodes and com- 
pare them to a null model of a network, in which al links were randomly 
rewired. We find that for both interaction and regulatory networks, links be- 
tween highly connected proteins are systematically suppressed, while those 
between a highly-connected and low-connected pairs of proteins are favored. 
This effect decreases the likelihood of cross talk between different functional 
modules of the cell, and increases the overall robustness of a network by 
localizing effects of deleterious perturbations. 

With the growth of experimental information about basic biochemical 
mechanisms of life, molecular networks operating in living cells are becom- 
ing better defined. Direct physical interactions between pairs of proteins 
form one such network. It serves as a backbone for functional and structural 
relationships among its nodes and defines pathways for the propagation of 
various signals such as phosphorylation and allosteric regulation of proteins. 
The information about specific binding of proteins to each other has recently 
grown by an unprecedented amount as a result of high throughput two-hybrid 
experiments [1], . The production and degradation of proteins participating 
in the interaction network is controlled by the genetic regulatory network 
of the cell formed by all pairs of proteins in which the first protein directly 
regulates the abundance of the second. The majority of known cases of such 
regulation happens at the level of transcription, in which a transcription fac- 
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tor positively or negatively regulates the RNA transcription of the controlled 
protein. The large scale structure of both these networks is characterized 
by a high degree of interconnectedness, where most pairs of nodes are linked 
to each other by at least one path. One may wonder how such a heavily 
intertwined and mutually dependent dynamical system can perform multiple 
functional tasks, and remain stable against deleterious perturbations. 

We analyzed the topological properties of interaction and transcription 
regulatory networks in yeast Saccharomyces cerevisiae, which at present is 
perhaps the best characterized model organism. The interaction network 
used in this work consists of 4549 physical interactions between 3278 yeast 
proteins as measured in the most comprehensive two-hybrid screen of yeast 
proteins while the genetic regulatory network is formed by 1289 directed 
positive or negative direct transcriptional regulations within a set of 682 
proteins as listed in the YPD database 0. The protein interaction network 
is a representative of the broad class of scale-free networks [|], [5], Q in which 
the number of nodes with a given number of neighbors (connectivity) K 
scales as a power law oc 1/K 1 . In our case the histogram of connectivities 
can be fitted by a power law with 7 = 2.5 ± 0.3 for K ranging from 2 to 
about 100 |7], A small part of the protein interaction network, formed by 
proteins known to be localized in the nucleus and to interact with at least 
one other nuclear protein, was visualized (Fig. 1). One striking feature of 
this graph is the abundance of highly connected proteins that are mostly 
connected to those with low connectivity, and thus well separated from each 
other. 

To test for correlations in connectivities of nodes for each of the above 
two networks we calculated the likelihood P(Kq,Ki) that two proteins with 
connectivities Kq and K\ are connected to each other by a link and com- 
pared it to the same quantity P r (Ko, K{) measured in a randomized version 
of the same network. In this "null model" network all proteins have exactly 
the same connectivity as in the original one, while the choice of their inter- 
action partners is totally random. The transcription regulatory network is 
naturally directed, while the network of physical interactions among proteins 
in principle lacks directionality. However, for poorly understood reasons the 
two-hybrid experimental data have a significant asymmetry between baits 
and preys, with bait hybrids being more likely to be highly connected than 
their prey counterparts. This can be seen e.g. in the fact that average con- 
nectivity of baits with at least one interaction partner is close to 3, whereas 
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the same quantity measured for preys is only 1.8. Since each reported inter- 
action involves one bait and one prey protein, this asymmetry needs to be 
taken into account when constructing an uncorrelated "null" model for the 
interaction network. For this purpose in our randomization procedure we 
would treat the two-hybrid data as a directed network with an arrow on each 
edge pointing out from bait to prey hybrid. Randomized versions of these 
two networks were constructed by randomly reshuffling links, while keeping 
the in- and out-degree of each node constant. A convenient numerical algo- 
rithm performing such randomization consists of first randomly selecting a 
pair of directed edges A— >B and C-^D The two edges are then rewired in 
such a way that A becomes connected to D, while C to B. However, in case 
if one or both of these new links already exist in the network this step is 
aborted and a new pair of edges is selected. This last restriction prevents the 
appearance of multiple edges connecting the same pair of nodes. A repeated 
application of the above rewiring step leads to a randomized version of the 
original network. Multiple sampling of randomized networks allowed us to 
calculate both the average expectation and the standard deviation for any 
particular property of the random network. 

Correlations in connectivities manifest themselves as systematic devia- 
tions of the ratio P(Kq, Ki)/P r (Ko, K\) from 1. We calculated this ratio 
for interaction (Fig. 2A) and regulatory (Fig. 2B) networks, with Kq and 
K\ being the total number of interaction partners of two interacting pro- 
teins (for the interaction network), and out- and in-degrees of two nodes 
connected by a directed edge 0— >1 (for the regulatory network). Thus by 
the very construction P(K , Ki) is symmetric for the physical interaction 
network but not for the regulatory network. We also estimated the statis- 
tical significance Z(K , K\) of the above deviations in the interaction (Fig. 
2C) and regulatory (Fig. 2D) networks, by dividing each observed deviation 
from the null model by the standard deviation in multiple realizations of a 
randomized network. The combination of these two plots reveals the regions 
on the K — K 1 plane, where connections between proteins in the real net- 
work are significantly enhanced or suppressed, compared to the null model. 
In particular red regions in the upper left and the lower right corners reflect 
the tendency of highly connected nodes (hubs) to associate with nodes of low 
connectivity, while the blue/green region in the upper right corner reflects 
the reduced likelihood that two hub centers are directly linked to each other. 
One should also note a prominent feature on the diagonal of the Fig. 2A and 
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2C corresponding to an enhanced affinity of proteins with between 4 and 9 
interaction partners to physically interact with each other. This feature can 
be tentatively attributed to the tendency of members of multi-protein com- 
plexes to interact with other proteins from the same complex. The above 
range of connectivities thus correspond to a typical number of direct inter- 
action partners of a protein in a complex. When we checked for interactions 
between proteins in this range of connectivities we found 39 pairs of inter- 
acting proteins to belong to the same complex in a recent high throughput 
study || , which is 4 times more than one would expect to find by pure chance 
alone. 

To further quantify and compare correlation patterns in interaction and 
regulatory networks we calculated the average connectivity {K%) of nearest 
neighbors of a node, as a function of its own connectivity K Q (Fig. 3A). 
In order to simplify the comparison between two networks here we charac- 
terize each node in the regulatory network by its total number of neighbors 
K = Ki n + K out . For both interaction and regulatory networks the average 
connectivity (Ki) shows a gradual decline with K$, which can be fitted with 
a power law (Ki) oc 1/Kq 6±0A over approximately two decades. This obser- 
vation gives an additional credit to the affinity between correlation patterns 



in these two protein networks visible in Fig. 2. It was recently found |L0 
that the internet, defined as the set of interconnected routers, in addition 
to a scale-free distribution of node connectivities similar to the protein in- 
teraction network, is characterized by the same correlation pattern between 
connectivities of neighboring nodes: (Ki) oc 1/K^ 5 . This extends by one 
step an intriguing similarity in the topology of these networks of completely 
different nature. 

For the scale-free physical interaction network we also plotted the prob- 
ability distribution of the nearest neighbor connectivity K\, measured sepa- 
rately for nodes with small connectivity K Q < 3, and for those with large con- 
nectivity Kq > 100 (Fig. 3B). In the absence of correlations this conditional 
probability does not depend on Kq, and is proportional to Ki/Kj ~ 1/K\^. 
This uncorrelated form holds approximately true for neighbors of a protein 
with low connectivity. It is only violated at the far tail of the distribution 
due to an excess likelihood of it being connected to a protein with very high 
connectivity, as was mentioned above. On the other hand, the distribution of 
connectivities K\ of neighbors of highly connected proteins scales as oc 1/ K\ ' 5 
and thus differs from that of lowly connected ones by a factor of 1/K\. 



4 



When analyzing molecular networks one should consider possible sources 
of errors in the underlying data. Two-hybrid experiments give rise to false 
positives of two kinds. In one case the interaction between proteins is real but 
it never happens in the course of the normal life cycle of the cell due to spatial 
or temporal separation of participating proteins. In another case an indirect 
physical interaction is mediated by one or more unknown proteins localized 
in the yeast nucleus. Reversely, in a high throughput two-hybrid screens one 
should expect a sizeable number of false negatives. Primarily a binding may 
not be observed if the conformation of the bait or prey heterodimer blocks 
relevant interaction sites or if the corresponding heterodimer altogether fails 
to fold properly. Secondly, 391 proteins out of the potential 5671 baits in 
were not tested as possible bait hybrids because they were found to activate 
transcription of the reporter gene in the absence of any prey proteins. 

Unlike for the interaction network, our data for the genetic regulatory net- 
work do not come from a single large scale project. Instead, they represent 
a collection of numerous experiments performed by different experimental 
techniques in different labs. Therefore, it is not feasible even to list possi- 
ble sources of errors present in such a diverse data set. In particular one 
should worry about a hidden anthropomorphic factor present in such a net- 
work: some proteins just constitute more attractive subjects of research and 
are, therefore, relatively better studied than others. One should also note 
that the transcription regulation network is only a subset of a larger genetic 
regulatory network, which in addition to transcriptional regulation includes 
translational regulation, RNA editing, etc. An encouraging sign was that 
when we separately analyzed the set representing the current knowledge || 
about this later more complete network, consisting of 1750 genetic regula- 
tions among 848 proteins we reproduced all of our empirical results for the 
transcriptional network. 

The observed suppression of connections between nearest neighbors of 
highly connected proteins is consistent with compartmentalization and mod- 
ularity characteristic of control of many cellular processes ||11|| . In fact, it 



suggests the picture of functional modules of the cell organized around in- 
dividual hubs. To further test the extent of modularity of hubs and their 
immediate neighborhood in each network we selected 15 highest connected 
nodes. To provide an unbiased sample of hubs from the point of view of in and 
out connectivity half of those nodes were selected as the highest out-degree 
hubs (8 baits with Kbait > 90 for the interaction network and 7 nodes with 
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K ou t > 34 for the regulatory network), while half were the highest in-degree 
hubs (7 preys with K prey > 20 for the interaction network and 8 nodes with 
K ou t > 8 for the regulatory network). In agreement with the correlation 
properties described above, direct connections between hubs were signifi- 
cantly suppressed. In the interaction network we observed 20 links between 
different hubs in this group, which is significantly below 56 ± 7.5 links in the 
randomized network. In the transcription regulatory network there were 16 
links between hubs in real network as opposed to 35 ± 6.5 in its randomized 
version. Not only direct links between hubs are suppressed in both studied 
networks, but hubs also tend to share fewer of their neighbors with other 
hubs, thereby extending their isolation to the level of next-nearest neighbor 
connections. The total number of paths of length 2 between the set of 15 
hubs in the interaction network is equal to 418, whereas in the null model 
we measured this number to be 653 ± 56. Similarly, for the transcriptional 
network the number of paths of length 2 is equal to 186 in the real network, 
whereas from the null model one expects it to be 262 ± 30. Since the number 
of paths of length 2 between a pair of proteins is equal to the number of their 
common interaction partners one concludes that both the hub node itself and 
its immediate surroundings tend to separate from other hubs, reinforcing the 
picture of functional modules clustered around individual hubs. 

A further implication of the observed correlation is in the suppression of 
the propagation of deleterious perturbations over the network. It is reason- 
able to assume that certain perturbations such as e.g. significant changes in 
the concentration of a given protein (including its vanishing altogether in a 
null-mutant cell) with a ceratin probability can affect its first, second, and 
sometimes even more distant neighbors in the corresponding network. While 
the number of immediate neighbors of a node is by definition equal to its 
own connectivity Kq, the average number of its second neighbors, given by 
K ((K 1 — l))jf , is sensitive to correlation patterns of the network. Since 
highly connected nodes serve as powerful amplifiers for the propagation of 
deleterious perturbations it is especially important to suppress this prop- 
agation beyond their immediate neighbors. It was argued that scale-free 
networks in general are very vulnerable to attacks aimed at highly connected 



nodes [[T|, [13|]. The anticorrelation presented above implies a reduced branch- 
ing ratio around these nodes and thus provides a certain degree of protection 
against such attacks. This may be the reason why the correlation between 
the connectivity of a given protein and the lethality of the mutant cell lacking 
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this protein is not particularly strong ||. 

It is feasible that molecular networks in a living cell have organized them- 
selves in an interaction pattern that is both robust and specific. Topologically 
the specificity of different functional modules can be enhanced by limiting 
interactions between hubs and suppressing the average connectivity of their 
neighbors. We have seen that such correlation pattern appears in a similar 
way in two different layers of molecular networks in yeast, and thus pre- 
sumably is a universal feature of all molecular networks operating in living 
cells. 
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Figure 1: Network of physical interactions between nuclear proteins. Here we 
show the part of the network reported in 0, consisting of all proteins that 
are known to be localized in the yeast nucleus [Q], and which interact with at 
least one other protein in the nucleus. This subset consists of 318 interactions 
between 329 proteins. Note that most neighbors of highly connected nodes 
have rather low connectivity. 
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Figure 2: Correlation profiles of protein interaction and regulatory networks 
in yeast. (A) The ratio P(K , K\) / P r (K , Ki), where P(K , K\) is the prob- 
ability that a pair of proteins with total numbers of interaction partners given 
by K , Ki correspondingly, directly interact with each other in the full set 
of [0, while P t (Kq,Ki) is the same probability in a randomized version of 
the same network. (B) The same as (A) but for a protein with the in-degree 
K in to be regulated by that with the out-degrees K out in the transcription 
regulatory network ||. (C) Z-scores for connectivity correlations from (A): 
Z{K Q: K X ) = {P{K Q ,K X ) - P r (K , K\))/ a r (K , K\) where a r (K , K,) is the 
standard deviation of P r (Ko, K\) in 1000 realizations of a randomized net- 
work. (D) As in (C) but for incoming and outgoing links in the the tran- 
scription regulatory network. To improve statistics the connectivities in all 
four panels of Fig. 2 were logarithmically binned into 2 bins per decade. 
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Figure 3: Correlations in connectivities of neighbors. (A) The average con- 
nectivity (Ki) of nearest neighbors of proteins with the connectivity K in the 
physical interaction network (triangles) and the regulatory network (squares). 
The solid line is a power law fit, oc 1/Kq M . (B) The probability distribution 
of connectivities K\ in the physical interaction network calculated separately 
for neighbors of proteins with small connectivity K < 3 (squares), and 
with large connectivity K > 100 (circles). Lines are power laws oc l/K\- b 
(dashed) and oc 1/Kf- 5 (solid). 
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